Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method

نویسندگان

  • Dongryeol Lee
  • Alexander G. Gray
چکیده

We propose a new fast Gaussian summation algorithm for high-dimensional datasets with high accuracy. First, we extend the original fast multipole-type methods to use approximation schemes with both hard and probabilistic error. Second, we utilize a new data structure called subspace tree which maps each data point in the node to its lower dimensional mapping as determined by any linear dimension reduction method such as PCA. This new data structure is suitable for reducing the cost of each pairwise distance computation, the most dominant cost in many kernel methods. Our algorithm guarantees probabilistic relative error on each kernel sum, and can be applied to high-dimensional Gaussian summations which are ubiquitous inside many kernel methods as the key computational bottleneck. We provide empirical speedup results on low to high-dimensional datasets up to 89 dimensions. 1 Fast Gaussian Kernel Summation In this paper, we propose new computational techniques for efficiently approximating the following sum for each query point qi ∈ Q:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ultrafast Monte Carlo for Kernel Estimators and Generalized Statistical Summations

Machine learning contains many computational bottlenecks in the form of nested summations over datasets. Kernel estimators and other methods are burdened by these expensive computations. Exact evaluation is typically O(n) or higher, which severely limits application to large datasets. We present a multi-stage stratified Monte Carlo method for approximating such summations with probabilistic rel...

متن کامل

Lekner summations and Ewald summations for quasi-two dimensional systems

Using the specific model of a bilayer of classical charged particles (bilayer Wigner crystal), we compare the predictions for energies and pair distribution functions obtained by Monte Carlo simulations using three different methods available to treat the long range Coulomb interactions in systems periodic in two directions but bound in the third one. The three methods compared are: the Ewald m...

متن کامل

ASKIT: An Efficient, Parallel Library for High-Dimensional Kernel Summations

Kernel-based methods are a powerful tool in a variety of machine learning and computational statistics methods. A key bottleneck in these methods is computations involving the kernel matrix, which scales quadratically with the problem size. Previously, we introduced ASKIT, an efficient, scalable, kernel-independent method for approximately evaluating kernel matrix-vector products. ASKIT is base...

متن کامل

Monte Carlo simulations for clutter statistics in minefields: AP-mine-like-target buried near a dielectric object beneath 2-D random rough ground surfaces

A rigorous three-dimensional (3-D) electromagnetic model is developed to analyze the scattering from anti-personnel (AP) nonmetallic mine-like target when it is buried near a clutter object under two-dimensional (2-D) random rough surfaces. The steepest descent fast multipole method (SDFMM) is implemented to solve for the unknown electric and magnetic surface currents on the ground surface, on ...

متن کامل

Ultrafast Monte Carlo for Statistical Summations

Machine learning contains many computational bottlenecks in the form of nested summations over datasets. Computation of these summations is typically O(n) or higher, which severely limits application to large datasets. We present a multistage stratified Monte Carlo method for approximating such summations with probabilistic relative error control. The essential idea is fast approximation by sam...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008